Overview

In collaboration with Tim Bendel from Bank of America, the Trainsformers team is focusing on strengthening U.S. railroads against climate-related disruptions. This project employs advanced data analytics and geospatial technologies to assess how rising temperatures and extreme weather can damage rail infrastructure and disrupt services.

While climate catastrophes often damage regions directly in their path, their effects ripple far beyond through the interconnected webs of infrastructure. Understanding these secondary impacts is vital as they can often exceed the initial economic damages. By integrating comprehensive datasets from from the National Oceanic and Atmospheric Administration NOAA and data from the U.S. Department of Transportation USDOT, we will identify and map these indirect risks to railway systems that play an integral role in the nation’s economic engine.

Our objective is to leverage data engineering and geospatial analytics to develop a predictive model that quantifies the vulnerability of U.S. rail networks to future climate-related disruptions. The insights gained will not only spotlight regions of indirect vulnerability but also guide infrastructural fortification efforts, ensuring that rail lines remain resilient in the face of climate uncertainty. With parallels to research published in the field, our project will navigate the complex landscape of environmental data, rail infrastructure, and economic implications.

The map below displays an extensive visualization of all rail network lines across the continental United States.

Row

Railway Network across the United States

Data Dictionary

Column{data-witdh = 500}

Weather Data Important Variables

  • BEGIN_YEARMONTH: The year and month when the climate event began, formatted as a six-digit number (YYYYMM).

  • BEGIN_DAY: The day of the month when the climate event started, represented by a two-digit number.

  • END_YEARMONTH: The year and month when the climate event ended, formatted in the same manner

  • END_DAY: The day of the month on which the climate event concluded, following the same format

  • BEGIN_DAY: The day of the month when the climate event started, represented by a two-digit number.

  • EVENT_TYPE: The type of Climate Event being recorded.

  • CZ_FIPS: Specific State ID number

  • DAMAGE_PROPERTY: The total amount of damage in dollars as a result of an event.

  • BEGIN_LAT: The specific Latitude where the event took place.

  • BEGIN_LON: The specific Longitude where the event took place.

Rail Data Important Variables

  • Objectid: Node id; unique id allocated per node

  • Frfranode: an ID for the segment (arc) from which the rail line starts

  • Tofranode: an identifier for the node where the rail network line ends

  • stateab: The state abbreviation

  • county: The Country abbreviation

  • division: Section of the US (i.e Mid America)

  • timezone: an identifier of timezone

  • shape_Length: represents the length of each segment of the rail line

  • miles: The length of the segments in miles

Column

Weather Data

BEGIN_YEARMONTH BEGIN_DAY BEGIN_TIME END_YEARMONTH END_DAY END_TIME EPISODE_ID EVENT_ID STATE STATE_FIPS YEAR MONTH_NAME EVENT_TYPE CZ_TYPE CZ_FIPS CZ_NAME WFO BEGIN_DATE_TIME CZ_TIMEZONE END_DATE_TIME INJURIES_DIRECT INJURIES_INDIRECT DEATHS_DIRECT DEATHS_INDIRECT DAMAGE_PROPERTY DAMAGE_CROPS SOURCE MAGNITUDE MAGNITUDE_TYPE FLOOD_CAUSE CATEGORY TOR_F_SCALE TOR_LENGTH TOR_WIDTH TOR_OTHER_WFO TOR_OTHER_CZ_STATE TOR_OTHER_CZ_FIPS TOR_OTHER_CZ_NAME BEGIN_RANGE BEGIN_AZIMUTH BEGIN_LOCATION END_RANGE END_AZIMUTH END_LOCATION BEGIN_LAT BEGIN_LON END_LAT END_LON EPISODE_NARRATIVE EVENT_NARRATIVE DATA_SOURCE YearMonth Total_Damage DAMAGE_PROPERTY_NUM DAMAGE_CROPS_NUM V56 V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70 V71 V72 V73 V74 V75 V76 V77 V78 V79 V80 V81 V82 V83 V84 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110 V111
195004 28 1445 195004 28 1445 NA 10096222 OKLAHOMA 40 1950 April Tornado C 149 WASHITA NA 4/28/1950 14:45 CST 4/28/1950 14:45 0 0 0 0 250 0 NA 0 NA NA NA F3 3.4 400 NA NA NA NA 0 NA NA 0 NA NA 35.12 -99.2 35.17 -99.2 NA NA PUB 195004 250 25 0 NA NA NA NA NA
195004 29 1530 195004 29 1530 NA 10120412 TEXAS 48 1950 April Tornado C 93 COMANCHE NA 4/29/1950 15:30 CST 4/29/1950 15:30 0 0 0 0 25 0 NA 0 NA NA NA F1 11.5 200 NA NA NA NA 0 NA NA 0 NA NA 31.9 -98.6 31.73 -98.6 NA NA PUB 195004 25 2 0 NA NA NA NA NA
195007 5 1800 195007 5 1800 NA 10104927 PENNSYLVANIA 42 1950 July Tornado C 77 LEHIGH NA 7/5/1950 18:00 CST 7/5/1950 18:00 2 0 0 0 25 0 NA 0 NA NA NA F2 12.9 33 NA NA NA NA 0 NA NA 0 NA NA 40.58 -75.7 40.65 -75.47 NA NA PUB 195007 25 2 0 NA NA NA NA NA
195007 5 1830 195007 5 1830 NA 10104928 PENNSYLVANIA 42 1950 July Tornado C 43 DAUPHIN NA 7/5/1950 18:30 CST 7/5/1950 18:30 0 0 0 0 2.5 0 NA 0 NA NA NA F2 0 13 NA NA NA NA 0 NA NA 0 NA NA 40.6 -76.75 NA NA NA NA PUB 195007 2.5 2 0 NA NA NA NA NA
195007 24 1440 195007 24 1440 NA 10104929 PENNSYLVANIA 42 1950 July Tornado C 39 CRAWFORD NA 7/24/1950 14:40 CST 7/24/1950 14:40 0 0 0 0 2.5 0 NA 0 NA NA NA F0 0 33 NA NA NA NA 0 NA NA 0 NA NA 41.63 -79.68 NA NA NA NA PUB 195007 2.5 2 0 NA NA NA NA NA
195008 29 1600 195008 29 1600 NA 10104930 PENNSYLVANIA 42 1950 August Tornado C 17 BUCKS NA 8/29/1950 16:00 CST 8/29/1950 16:00 0 0 0 0 2.5 0 NA 0 NA NA NA F1 1 33 NA NA NA NA 0 NA NA 0 NA NA 40.22 -75 NA NA NA NA PUB 195008 2.5 2 0 NA NA NA NA NA

Column

Rail Data

objectid fraarcid frfranode tofranode stfips cntyfips stcntyfips stateab country fradistrct rrowner1 rrowner2 rrowner3 trkrghts1 trkrghts2 trkrghts3 trkrghts4 trkrghts5 trkrghts6 trkrghts7 trkrghts8 trkrghts9 division subdiv branch yardname passngr stracnet tracks net miles km timezone shape_Length
1 300000 348741 348746 38 15 38015 ND US 8 DMVW WESTERN EIGHTH XLINE 1 M 0.1781007 0.2866258 C 0.0031940
2 300001 338567 338686 30 87 30087 MT US 8 BNSF 0 O 0.8865854 1.4268238 M 0.0172269
3 300002 330112 330117 16 31 16031 ID US 8 EIRR SOUTHERN TWIN FALLS TWIN FALLS 1 M 0.2218197 0.3569849 M 0.0042691
4 300003 330113 330116 16 31 16031 ID US 8 EIRR SOUTHERN RAFT RIVER INDUSTRIAL SPUR RAFT RIVER IL 1 I 0.1275709 0.2053059 M 0.0024842
7 300006 312341 312373 41 35 41035 OR US 8 UP PACIFIC NORTHWEST MODOC #N 1 M 0.7004531 1.1272723 P 0.0136099
9 300008 328030 328032 16 39 16039 ID US 8 USG 0 O 0.3128218 0.5034390 M 0.0054321

Data Processing

Column

Phase 1

The first task when working on the three-phase probabilistic model was to transform the raw, unstructured 72-year climate event data into a format that is suitable for comprehensive analysis. To achieve this, we executed a two-part aggregation strategy that uses both location and time as parameters from the Complete.data. Geographic data was collected at a state level, which provided a broad perspective on the distribution of climate events. Simultaneously, temporal data was consolidated monthly, allowing us to differentiate patterns and trends over time. In parallel, we gathered information regarding how often climate events occur and performed a diligent conversion of reported damages into a consistent numeric format. This Phase1 conversion was critical for further accurate quantitative assessments.

Phase 2

After the successful aggregation of the dataset, the next crucial phase involved the construction of our dependent variable. This variable will help predict whether a weather event is likely to occur within a two-year forecast horizon for any given geographical unit and time frame. To implement this, we appended a new column, labeled “y”, to our dataset.

Before adding this column “y”, it was imperative to generate a comprehensive matrix of all the possible geographic and temporal combinations. This exhaustive array encompassed every possible combination, even those not represented in the original dataset. For instances absent from the raw data, we assigned a default value of zero for both the event occurrences and associated damages. This preemptive measure was essential to ensure that our model accounts for periods and locations where no events were recorded.

Once the matrix is established, we derive the dependent variable by surveying each record. We marked a “1” in column “y” if any weather events are recorded for the corresponding two-year period ahead of the original date. Conversely, a “0” will denote the absence of events. In scenarios where the required two years of future data is unavailable, the entry will be left as Null or N/A. Two critical considerations accompanied this process:

Records containing Null or N/A in the dependent variable column were excluded from the dataset before the start of the model training to ensure the integrity and applicability of our predictive analytics.

If the aggregated data Phase2 at the state level yielded a uniform value of “1” across the dependent variable, indicative of an overgeneralization, it necessitates a transition to a more refined geographical granularity. This entails the adoption of smaller spatial units such as ZIP codes, hexbin locations, or a latitude-longitude grid to achieve a more discerning and useful predictive model.

Phase 3

To enhance the predictive capability of our model and enhance the analytical depth of our dataset, we incorporated a set of engineered features based on historical data as a part of Phase3. These features incorporated rolling lookback metrics that reflected the frequency and severity of past weather events. We also had to develop variables that quantified the cumulative number of events and aggregate damages over preceding time intervals. For a comprehensive temporal analysis, these intervals are categorized into short, medium, and long-term periods, providing a distinct window into the historical pattern of weather impacts. This approach is a multi-faceted perspective that offers variables such as “number of events in the previous three time periods” and “total damage in the previous three time periods.” By adjusting the length of these lookback windows, we can capture the immediate and extended impacts of climate events; this enhances the model’s comprehension of recent and past conditions.

Our finalized dataset will, therefore, be a robust matrix featuring these meticulously crafted lookback variables along with the original data. This sets a solid foundation for a predictive model of effective accuracy and reliability.

Column

List of Removed Locations and Weather Events from our Data Frame
Location Event
Hawaii Astronomical Low Tide
Alaska Coastal Flood
Guam Dense Smoke
Puerto Rico Drought
Gulf of Mexico Dust Devil
Atlantic North Dust Storm
Atlantic South Freezing Fog
Hawaii waters Funnel Cloud
E Pacific Heat
Virgin Islands High Surf
American Samoa Lake-Effect Snow
Lake Superior Lakeshore Flood
Lake St Clair Marine Hail
Lake Ontario Marine High Wind
Lake Erie Marine Strong Wind
Lake Michigan Marine Thunderstorm Wind
Lake Huron Rip Current
Waterspout
Seiche
Sleet
Sneakerwave
Tsunami
Tropical Depression
Tropical Storm
Volcanic Ashfall

Damage Analysis

Column

Total Damage

Normalized by State Area

Column

The bar graph succinctly prioritizes states based on the financial impact of weather events. Texas emerges as the outlier with the highest damages, suggesting a critical review of its disaster response and infrastructure resilience is warranted. The presence of Midwestern states like Iowa and Nebraska highlights the significant toll of storms in the nation’s agricultural heartland, raising questions about the interplay between climate events and economic vulnerabilities tied to agriculture and land use. Interestingly, coastal and southern states, typically the focus of hurricane-related damages, are interspersed among less frequently discussed states like Illinois and Wisconsin. This distribution prompts us to consider a broader range of climate impacts beyond the obvious high-risk areas. The chart’s data can inform not just reactive policies but proactive investments in technology and infrastructure to fortify states against predictable damages. It also hints at the potential benefits of cross-state learning, where lower-impact states could share best practices in climate event mitigation. This analysis underscores the importance of a strategic, data-informed approach to climate resilience, where the allocation of resources is as dynamic and varied as the weather patterns themselves.

This normalized bar chart provides an adjusted view of the financial impact of weather events, accounting for the size of each state. This normalization allows for a more equitable comparison by illustrating the damage relative to the geographic area, rather than total damages which can be skewed by the size and economic activity of a state. From the chart, Iowa stands out with the highest damage per square mile, indicating that, when size is considered, its relative economic impact from weather events is the most substantial. This could reflect a high density of valuable assets or infrastructure within a smaller area, or an exceptional severity of weather events. States like Ohio, Mississippi, and New Jersey follow, suggesting these states, though varying in size and geography, face significant impacts from weather events relative to their area. Interestingly, larger states with significant total damages like Texas appear further down the list when the damage is adjusted per square mile, highlighting the importance of considering geographic scale in such analyses. The chart also informs us that less geographically extensive states with high population densities or substantial infrastructure—like Delaware and New Jersey—may experience high normalized damages, underscoring the potential for significant impact in smaller areas.

Monthly Plots

The left map illustrates the total number of weather-related events that occurred across the United States each month over the span of 72 years (1950-2022). The states are color-coded according to the volume of events, with darker shades indicating a higher frequency. This visual representation highlights regions that are more active during the first month of the year, which could be critical for understanding seasonal patterns and for preparing emergency management resources accordingly.

On the right, the map displays the normalized data of weather events, taking into account the land area of each state. By normalizing these figures, we gain a proportional insight into the intensity of weather events relative to state size. This normalization allows for an equitable comparison across states, ensuring that both large and small states can be accurately assessed for their weather event density.

Utilize the monthly tabs below to navigate through weather event data for different times of the year. This feature allows for a quick comparative analysis to identify monthly and seasonal trends in weather-related events across the nation.

Note that the ‘Total Events’ map may show significant activity in larger states, while the ‘Normalized Events’ map can reveal which states have a higher density of events per square mile. It’s crucial to consider both perspectives when assessing the impact and preparing for future weather conditions.

Columns

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Column

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Investigating Nodes

Column

Centrality Nodes Per State

Column

Our dataset for the rail component consists of rail lines spanning all 50 states, Mexico, Canada, and DC. It contains information such as timezone, length of segments in miles, to and from node values, etc. We did not subset this data, however, we did narrow the scope to Continental US states only.

Network Analysis is a method used to find the shortest path using a specific network. It applies to a multitude of applications. i.e. planes, cars, social media analytics, etc.​ It quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. While there are many different methods of conducting network analysis, betweenness was found to be the best for this project.

Locations per State with the Highest Betweenness Values
States Location
Alabama Five Points South, Birmingham, AL, 35233
Arizona 701 W Harrison St, Phoenix, AZ, 85007
Arkansas Pine Bluff, AR, 71601
California Bakersfield, CA, 93305
Colorado North Denver, Denver, CO, 80202
Connecticut Old Saybrook, CT, 06475
Delaware 1570 Porter Rd Tunnel, Bear, DE, 19701
Florida Duval County, FL, 32234
Georgia Macon, GA, 31201
Idaho Pocatello, ID
Illinois Dwight Township, IL, 60420
Indiana Wayne, Indianapolis, IN, 46234
Iowa Marshall County, IA, 50158
Kansas North Topeka West, Topeka, KS, 66608
Kentucky Lexington, KY, 40508
Louisiana Mid City North, LA, 70805
Maine 2 Newhall St, Fairfield, ME, 04937
Maryland Baltimore, MD
Massachusetts Worcester, MA, 01604
Michigan Durand, MI, 48429
Minnesota Coon Rapids, MN, 55433
Mississippi Jackson, MS, 39203
Missouri Kansas City, MO, 64053
Montana Cascade County, MT, 59404
Nebraska Blaine, NE, 68901
Nevada Humboldt County, NV, 89445
New Hampshire Whitefield, NH, 03598
New Jersey Rahway City School District, Rahway, NJ
New Mexico Valencia County, NM, 87002
New York Lakefront, Syracuse, NY, 13204
North Carolina Stanly County, NC, 28128
North Dakota Minot, ND, 58701
Ohio Columbus, OH 432155
Oklahoma McAlester, OK, 74501
Oregon Lloyd District, Portland, OR, 97232
Pennsylvania Rockville Bridge near Harrisburg, PA
Rhode island Warwick, RI, 02886
South Carolina Columbia, SC
South Dakota Wolsey, SD, 57384
Tennessee Haywood County, TN, 38012
Texas Fort Worth, TX, 7610
Utah 400w 200 S, Salt Lake City, UT, 84101
Vermont Essex Junction, VT, 05452
Virginia Burkeville, VA, 23922
Washington 4, Auburn, WA, 98001
West Virginia Mason County, WV, 25550
Wisconsin Fox Crossing, WI, 54956
Wyoming Converse County, WY, 82633

Centrality Calculations

Column

Top Centrality Node in the US

Top Five Centrality Node in the US

Column

After calculating the betweenness centrality for the entire continental US, we found that the node with the highest value, 0.22, is located in Lima, Ohio. This is significant because the Lima railroad station served as a major hub, connecting five major continental railroads across the US in the early twentieth century. These railroads included the Pennsylvania Railroad, Baltimore and Ohio Railroad, New York Railroad, Chicago and St. Louis Railroad, Erie Railroad, and Detroit, Toledo, and Ironton Railroad. By the 1990s, all passenger rail lines had been discontinued. Currently, it has been restored and serves as both a museum and an office.

Based on the visualization, we can see that the top 5 nodes across the US are located in the same general area, although we can only see 3 points, this just means that the nodes are closely located to one another. We can further see a breakdown in location in the chart below.

  Nodeid Centralityscore Latitude Longitude                        Location
1 435096          0.2258 40.74490  -84.1042                 Lima, OH, 45801
2 435236          0.2236 40.74570  -84.0882                 Lima, OH, 45804
3 416842          0.2225 41.56690  -87.4178            Calumet Township, IN
4 429164          0.2223 41.07137  -85.1283 Hanna Creighton, Fort Wayne, IN
5 429161          0.2221 41.07140  -85.1302           LaRez, Fort Wayne, IN

Density of Rail Lines

Column

Rail Line Density across the United States

Rail Line Density Normalized Per State Area Across the United States

Column

The purpose of this side-by-side comparison of heatmaps is to show the difference between the density of rail lines and the railway mileage of each state. The heatmap to the left shows the total railway miles per US state. From our US DOT rail data, we extracted the state column and miles column which states the distance of individual railway segments that are recorded. The total railway miles for a state were computed by summing up all of the individual railway segments. We can see that Texas has the most railway mileage, followed by Illinois and California. For Texas and California, their large square mileage and high number of export products are leading factors in the ability for more rail lines to run along these states. Illinois’s high concentration of rail lines can be explained by numerous factors such as the state being the center of many of the nation’s rail networks and Chicago being the largest US rail gateway.

The heatmap to the right shows the level of railway densities per state by using the summation of the total railway miles and dividing it by the area of a state in square miles. You can tell in states like California and Texas that had high railway mileage, the railway density is on the lower end. Some states such as Florida, Virginia, and North Carolina seem to have stayed in a relatively similar range between the railway mileage and railway density comparisons. There is a pattern of Northeastern states like Illinois, Ohio, and Pennsylvania that have high railway densities and can be historically attributed to being important hubs for the transportation of various goods and a vital junction of rail lines that run from the East to the West. We can see that New Jersey is the most densely populated in railways due to being the most densely populated state and being neighboring states to major cities like Philadelphia and New York City. The state is also home to very large economic activity and a high volume of freight rail traffic.

Column

Railway Miles by State
States Rail.Miles Density.of.Miles
Alabama 4378.5426 8.35280e-02
Arizona 2677.6307 2.34900e-02
Arkansas 3393.5617 6.38140e-02
California 9449.7252 5.77270e-02
Colorado 3846.1349 3.69490e-02
Connecticut 723.3180 1.30482e-01
Delaware 351.8919 1.41395e-01
Florida 4121.1503 6.26720e-02
Georgia 5683.6332 9.56440e-02
Idaho 2657.7008 3.21590e-02
Illinois 9997.6332 1.72630e-01
Indiana 5778.9817 1.58678e-01
Iowa 5168.5291 9.18480e-02
Kansas 6344.9123 7.71150e-02
Kentucky 3583.7934 8.86910e-02
Louisiana 3716.2512 7.09500e-02
Maine 1750.0355 4.94640e-02
Maryland 1325.4027 1.06836e-01
Massachusetts 1481.2951 1.40349e-01
Michigan 5324.2994 5.50520e-02
Minnesota 6360.7271 7.31660e-02
Mississippi 3571.2015 7.37370e-02
Missouri 5674.2102 8.14010e-02
Montana 3917.0154 2.66390e-02
Nebraska 4622.2144 5.97590e-02
Nevada 1894.2960 1.71320e-02
New Hampshire 633.0250 6.77090e-02
New Jersey 2010.2172 2.30495e-01
New Mexico 2878.6835 2.36750e-02
New York 5113.2500 9.37270e-02
North Carolina 4431.9766 8.23490e-02
North Dakota 4255.0218 6.01760e-02
Ohio 7557.9691 1.68608e-01
Oklahoma 4181.5089 5.98220e-02
Oregon 3303.5101 3.35800e-02
Pennsylvania 7455.3994 1.61883e-01
Rhode island 182.9365 1.18414e-01
South Carolina 2939.4058 9.17980e-02
South Dakota 2410.7400 3.12610e-02
Tennessee 3820.6410 9.06560e-02
Texas 14316.9965 5.33030e-02
Utah 2700.5354 3.18100e-02
Vermont 682.4734 7.09780e-02
Virginia 4309.6317 1.00751e-01
Washington 5452.2056 7.64710e-02
West Virginia 2840.0150 1.17210e-01
Wisconsin 4587.7059 7.00450e+04
Wyoming 2571.1431 7.00450e-02

Recommendations

We propose that Bank of America takes a cumulative look at the states that are the most vulnerable to climate event occurrences as well as noting the states that have the highest rail densities and the points of highest centrality on a state and national level. Inferences can be made from our climate and rail analysis’, which was mostly visualized separately, but further exploring this combined relationship can help with gaining a more holistic understanding of the railway routes that are vulnerable to weather events. This can assist Bank of America in conducting a precise analysis of the severity of climate event damage to railway lines.

This would likely require a climate event analysis on a more magnified scope so that we can precisely pinpoint the rail lines that are affected the most from varying climate events. Finding a climate event dataset that has the longitude and latitude for all the climate events would help in carrying out this more magnified analysis of the weather events on a county or zip code basis. We were able to pinpoint locations of top rail centrality primarily because of the longitude and latitude attributes; therefore, having this for the weather data would be very beneficial in future mappings and overlapping the weather and rail data with the highest level of precision.

This analysis would streamline the company’s economic evaluation of the regions most prone to rail vulnerabilities, which would aid Bank of America in its interactions with clients..Bank of America aims to provide insights to its clients regarding loan approvals, mortgages, and potential investments, particularly for those areas that are susceptible to impacted infrastructure.

There were a few limitations that we ran into when analyzing our data:

Column

Weather Data Limitations

  1. As shown in several plots, the density of weather events is very high in certain states. For example, Texas, however, this makes logical sense since Texas is a very large state. Normalizing the distance for the states helped aid in minimizing this issue. Additionally, there are very few, comparatively, weather events occurring on the western coast. This could be due to a lack of recording such data. Since looking at 1950, the earliest year in this dataset, we can see it is a small subset of data. The lack of weather events on the West Coast could also be attributed to a higher concentration of other events that aren’t among the top 5 most frequent weather events. This ultimately makes it challenging to fully gauge the potential impact of weather events on the West Coast.
  2. Another limitation would be the inability to plot specific locations of certain weather events. There were quite a few events that had longitude and latitude missing, whether they were not recorded or because the event spanned a large area. This made it difficult to plot certain events with a high level of accuracy.
  3. Finally, when it came to damage analysis, we were unable to differentiate between private personal property, commercial property, and rail lines, Therefore, we conducted overall damage analysis by event type and location, but there’s no specificity regarding damage done especially to rail lines. This could be something to look into moving forward.

Rail Data Limitations

  1. The betweenness calculation, as mentioned previously, is sensitive to perspective. For example, the node seen as the most important in the entire United States wasn’t seen as the most important when a closer calculation was conducted on that specific state. Because of this phenomenon, boundary lines are not taken into account. They were considered when running the full U.S. but when zooming in on a state level, the intersecting rail lines across state boundaries were not considered since the rail line stopped at the edge of the state.
  2. Additionally, the computational intensity for running the betweenness on the entire United States is very high. If the cell containing this code started running at 1 am on Friday, then it didn’t finish running until 2 am on Monday. These extended run times restricted the approaches we can test and implement. There might be alternative metrics that could enhance analysis that have not been considered.
  3. More recently, after calculating the highest betweenness for the entire U.S., we found that the top node in Lima Ohio is a discontinued site. This is something to consider in how this ultimately affects our analysis, knowing that the top node is not currently an in-service junction. Additionally, this information is not included in our dataset, so we need to determine how many more important nodes are being considered that are also not currently operational.
  4. The overlay plot was having issues. Merging the two very dense datasets was difficult to do successfully and would require a bit more time to properly do so to create these plots.
  5. Finally, we were unable to differentiate between cargo lines and passenger lines since this was not consistently stated within the data. Further research could be done to look into this more.

Future Work

An important future step that should be taken is continuing our project progress by mapping the combined probabilistic model of rail lines and climate data. We created a probabilistic model with the climate data with our three-phased process, but we didn’t get to map the probabilistic model against the rail lines. This would have helped to indicate the areas in the US with the highest vulnerability when utilizing a predictive modeling method.

Another specified type of analysis that would be worth noting is categorizing high-impact and low-impact weather events as two different forms of climate events. This would help with getting a more specific understanding of the level of impact that rail infrastructure would undergo. As of now, we have all weather events combined in one when assessing secondary impacts, but all impacts aren’t realistically gauged at the same level when there is a wide range of weather events that have varying impacts. Subdividing the events into these various categories can help with making more situation-specific conclusions as to the type of impact that the weather events would pose on rail infrastructure.

In terms of rail line analysis, analyzing rail lines that run along borders would be something to look into. When our team explored the top centrality node for each state, the neighboring states were taken out of the picture, so lines that span multiple states were cut short in the state-by-state analysis. Top centrality nodes could differ when taking other connective state rail lines into account.